-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: timedelta64(NaT) incorrectly treated as datetime in some dataframe ops #28049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@TomAugspurger thoughts? lots of ops stuff in the works, getting really close to fixing the perf problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could a user hit this from normal code? (i.e. do we need a release note?)
Yes. e.g. the test this implements incorrectly raises TypeError on master. Will update with release note. |
@jreback gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just move the note to 1.0
doc/source/whatsnew/v0.25.2.rst
Outdated
@@ -18,7 +18,7 @@ Categorical | |||
|
|||
Datetimelike | |||
^^^^^^^^^^^^ | |||
|
|||
- Bug in :class:`DataFrame` arithmetic operations when operating with a :class:`Series` with dtype `'timedelta64[ns]'` (:issue:`28049`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can move to 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated+green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor clarification point. Was about to merge but want to double check before.
Note I fixed up unneeded changes to the 0.25.2 whatsnew so would need to pull locally if a change required here (or edit directly in GH)
right = np.asarray(right) | ||
|
||
def column_op(a, b): | ||
return {i: func(a.iloc[:, i], b[i]) for i in range(len(a.columns))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check - is the second argument to func
supposed to be accessed via .iloc
here as well?
Side note - an alternate approach for by column iteration is to call df.columns
; not sure if there is a perf difference but have seen both in code base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the second argument to func supposed to be accessed via .iloc here as well?
No, at this point b
is an ndarray.
Side note - an alternate approach for by column iteration is to call df.columns; not sure if there is a perf difference but have seen both in code base
That runs in to difficulties if there are duplicate columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if there is a perf difference
Might be worth looking at using iat
instead of iloc
for perf
Sounds good - merge away
…Sent from my iPhone
On Sep 18, 2019, at 9:46 AM, jbrockmendel ***@***.***> wrote:
@jbrockmendel commented on this pull request.
In pandas/core/ops/__init__.py:
> @@ -499,8 +499,19 @@ def column_op(a, b):
# in which case we specifically want to operate row-by-row
assert right.index.equals(left.columns)
- def column_op(a, b):
- return {i: func(a.iloc[:, i], b.iloc[i]) for i in range(len(a.columns))}
+ if right.dtype == "timedelta64[ns]":
+ # ensure we treat NaT values as the correct dtype
+ # Note: we do not do this unconditionally as it may be lossy or
+ # expensive for EA dtypes.
+ right = np.asarray(right)
+
+ def column_op(a, b):
+ return {i: func(a.iloc[:, i], b[i]) for i in range(len(a.columns))}
not sure if there is a perf difference
Might be worth looking at using iat instead of iloc for perf
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
lgtm merge on green |
@WillAyd gentle ping |
Thanks @jbrockmendel |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff